Resolution adjustment of an image that includes text undergoing an ocr process

ABSTRACT

An optical character recognition process characterizes text lines in a textual image by their base-line, mean-line and x-height. The base-line for at least one text line in the image is determined by finding a parametric curve that maximizes a first fitness function that depends on the values of pixels through which the parametric curve passes and pixels below the parametric curve. The base-line corresponds to the parametric curve for which the first fitness function is maximized. The first fitness function is designed so that it increases with increasing lightless or brightness of pixels immediately below the parametric curve while also increasing with decreasing lightness of pixels through which the parametric curve passes. The mean-line is determined by incrementally shifting the base-line upward by predetermined amounts (e.g., a single pixel) until a second fitness function for the shifted base-line is maximized. The second fitness function is essentially the inverse of the first fitness function. Specifically, the second fitness function increases with increasing lightless of pixels immediately above the shifted base-line while also increasing with decreasing lightness of pixels through which the shifted base-line passes. The x-height is equal to the sum of the predetermined amounts by which the base-line is shifted upward in order to maximize the second fitness function. In some cases different groups of text-lines in the textual image may be characterized differently from one another. For example, each group may be characterized by a most probable x-height for that group.

BACKGROUND

Optical character recognition (OCR) is a computer-based translation ofan image of text into digital form as machine-editable text, generallyin a standard encoding scheme. This process eliminates the need tomanually type the document into the computer system. A number ofdifferent problems can arise due to poor image quality, imperfectionscaused by the scanning process, and the like. For example, aconventional OCR engine may be coupled to a flatbed scanner which scansa page of text. Because the page is placed flush against a scanning faceof the scanner, an image generated by the scanner typically exhibitseven contrast and illumination, reduced skew and distortion, and highresolution. Thus, the OCR engine can easily translate the text in theimage into the machine-editable text. However, when the image is of alesser quality with regard to contrast, illumination, skew, etc.,performance of the OCR engine may be degraded and the processing timemay be increased due to processing of all pixels in the image. This maybe the case, for instance, when the image is obtained from a book orwhen it is generated by an imager-based scanner, because in these casesthe text/picture is scanned from a distance, from varying orientations,and in varying illumination. Even if the performance of scanning processis good, the performance of the OCR engine may be degraded when arelatively low quality page of text is being scanned.

SUMMARY

Optical character recognition requires the identification of the textlines in the textual image in order to identify individual words andcharacters. The text lines can be characterized by their base-line,mean-line and x-height. Determining these features may be becomedifficult when the text lines are not perfectly horizontal, which mayarise when scanning some classes of documents (for example a thick book)in which the image suffers from non-linear distortions. In such a case,the base-line and mean-line may not be constant over an entire textline.

To overcome these problems, in one implementation the base-line for atleast one text line in the image is determined by finding a parametriccurve that maximizes a first fitness function that depends on the valuesof pixels through which the parametric curve passes and pixels below theparametric curve. The base-line corresponds to the parametric curve forwhich the first fitness function is maximized. The first fitnessfunction is designed so that it increases with increasing lightless orbrightness of pixels immediately below the parametric curve while alsoincreasing with decreasing lightness of pixels through which theparametric curve passes.

In some implementations the mean-line can be determined by incrementallyshifting the base-line upward by predetermined amounts (e.g., a singlepixel) until a second fitness function for the shifted base-line ismaximized. The second fitness function is essentially the inverse of thefirst fitness function. Specifically, the second fitness functionincreases with increasing lightless of pixels immediately above theshifted base-line while also increasing with decreasing lightness ofpixels through which the shifted base-line passes.

In some implementations the x-height can be determined from thebase-line and the mean-line which have already been calculated. Inparticular, the x-height is equal to the predetermined amount by whichthe base-line is shifted upward in order to maximize the second fitnessfunction.

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used as an aid in determining the scope of the claimed subjectmatter.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows one illustrative example of a system 5 for opticalcharacter recognition (OCR) in an image.

FIG. 2 shows an example of a text-line in a scanned image which is notperfectly horizontal.

FIG. 3 illustrates the base-line for the text-line of a single word in ascanned image.

FIG. 4 is a flowchart illustrating a process of determining the x-heightfor different groups of text lines.

FIG. 5 shows one example of an image processing apparatus that mayperform the process of extracting information concerning the text-linesin a textual image.

DETAILED DESCRIPTION

FIG. 1 shows one illustrative example of a system 5 for opticalcharacter recognition (OCR) in an image which includes a data capturearrangement (e.g., a scanner 10) that generates an image of a document15. The scanner 10 may be an imager-based scanner which utilizes acharge-coupled device as an image sensor to generate the image. Thescanner 10 processes the image to generate input data, and transmits theinput data to a processing arrangement (e.g., an OCR engine 20) forcharacter recognition within the image. In this particular example theOCR engine 20 is incorporated into the scanner 10. In other examples,however, the OCR engine 20 may be a separate unit such as stand-aloneunit or a unit that is incorporated into another device such as a PC,server, or the like.

The OCR engine 20 receives a textual image as a bitmap of text lines.Three parameters of those text lines that need to be determined are the“base-line,” “mean-line,” and the “x-height.” The “base-line” is definedas a horizontal line passing through the bottom ends of a majority ofthe characters in a line of text (excluding descenders). Second, the“mean-line” is defined as a horizontal line which passes through the topends of a majority of the characters in a line of text (excludingascenders). Third, the “x-height” is defined as the vertical distancebetween the base-line and the mean-line, which corresponds to the heightof a majority of lowercase letters in the line (excluding non-descendersand non-ascenders).

Knowing the precise base-line and x-height is important for a number ofreasons, particularly in differentiating between capital and lowercaseletters of the same shape. If a text-line is perfectly horizontal andcontains only one font style and size, base-line and x-height will holda constant value over the entire line. Computing these values for aperfectly horizontal text-line is not a difficult task. However, whenscanning some classes of documents (for example a thick book), thedocument image can suffer from non-linear distortions. In such a case,the base-line coordinate is not going to be constant over an entire textline.

An example of a text line containing this artifact is shown in the FIG.2. It can be seen that the text has a “wavy” appearance, which is causedby the decrease in the average letter position from the line's middletowards the left or right. Artifacts of this nature make it moredifficult to determine the base-line.

Extracting x-height information from a textual image can also beproblematic. For instance, sometimes a majority of a text line (or evenan entire text line) is composed of capital letters or numbers. In sucha case, extracting the x-height using the line's bitmap as a uniqueinformation source is not reliable. FIG. 2 also shows the base-line,mean-line and x-height.

As detailed below, a method is provided to compute the base-line of adeformed text-line in the form of a parametric curve. Moreover, the mostprobable x-height value of a given line is estimated using contextinformation obtained from the entire image.

Base-Line Computation

At the outset, two observations can be made from the base-line'sdefinition:

-   -   Due to the nature of most fonts, the base-line will overlap with        a significant amount of dark pixels originating from letter        bottoms.    -   Immediately below the base-line there are no dark pixels (except        for descending letter parts).        Regardless of whether or not the base-line is strictly        horizontal or (in case of non-linear deformations) “wavy,” it        should be possible to establish a simple fitness function based        on at least two properties obtained from these observations.    -   Property 1: As the pixels immediately below the baseline become        lighter (i.e., brighter), the value of the fitness function will        increase (and vice versa).    -   Property 2: As the pixels overlapping with the baseline become        darker, the value of the fitness function will increase (and        vice versa).

The goal of finding the base-line in a given text-line bitmap translatesto the problem of finding a (curved) line with a maximal fitnessfunction value.

A rasterized base-line can be implemented as an array: for eachx-coordinate of an input bitmap, there should be one and onlyy-coordinate describing the local baseline value. Keeping this in mind,a simple proposal for the fitness function is:

${{fitness}({baseline})} = {{\sum\limits_{x = 0}^{{width} - 1}\; {{img}\left\lbrack {{{{baseline}\lbrack x\rbrack} + 1},x} \right\rbrack}} - {\sum\limits_{x = 0}^{{width} - 1}{{img}\left\lbrack {{{baseline}\lbrack x\rbrack},x} \right\rbrack}}}$

Where:

-   -   x and y are horizontal and vertical pixel coordinates        respectively (with the origin is at top-left corner)    -   img[y, x] is the input bitmap's pixel value at location (y, x)    -   width is the input bitmap's width    -   baseline [x] is a y-coordinate of the base-line at position x

It can be observed that the formula for the fitness function satisfiesboth Property 1 and Property 2. Since the pixel colors in a typicalgray-scale image vary from black (value: 0) to white (value: 255), thefollowing will hold true:

-   -   As the pixels immediately below the baseline become lighter, the        first addend in the formula will become larger.    -   As the pixels on the baseline become darker, the second addend        in the formula will become larger.

A simple diagram of text illustrating this idea is presented in FIG. 3.It can be observed from FIG. 3 that the base-line overlaps with arelatively large number of dark pixels, while pixels immediately belowthe base-line are completely white.

After defining the criteria that the baseline should fulfill, anotherquestion that arises is how “fast” the baseline should change across thetext-line width when maximizing the fitness function. Clearly, this rateof change should be sufficient to track the line's “waviness”.

On the other hand, the rate of change should not be too fast, because itis not desirable for the bottoms of descending characters to affect thebase-line shape. One way to address this issue is to define a base-linecandidate through a small set of control parameters, and limiting therange of values each parameter can take. In this way the shape of thebase-line candidate can be changed by changing its control parameters.

A curve maximizing the fitness function can be parameterized by definingit through a set of control points connected with straight linesegments. The curve's shape can be varied by moving its control points.One way to control the movement of the control point in a manner thatachieves good performance results is to only allow the control point tohave freedom of movement in the vertical direction. This approach hasshown that a set of 4-6 equidistant control points does a good job inmodeling a common “wavy” baseline.

A second way of parameterizing the curve for the base-line is bydefining it as a B-spline. Changing its shape can be done by varying thespline coefficients.

In general, finding the exact shape that maximizes some fitness functioncan be thought of as a classical optimization problem which can besolved using well-known techniques. Depending on the nature and numberof parameters used to describe the base-line curve, a genetic search,dynamic programming, or some other technique can be used.

If a genetic search is performed, an initial population can be a set ofcurves with parameters randomly set within some reasonable range. Newoffspring can be formed by taking two high-fitness curves and mixingtheir parameters into a new curve. Mutation can be done by slightlyvarying curve parameters.

The curve parameters can be optimized by dynamic programming as well.The solution requires finding an optimal path starting at thetext-line's left side and moving towards its right side, while obeyingthe spatial constraints imposed by the common curve shape.

X-Height Computation

The mean-line (a line determining where non-ascending lowercase lettersterminate) can be computed in a way quite similar to the base-linecomputation procedure described above. Actually, it is enough to invertthe fitness function described above and re-run the algorithm. That is,the fitness function for the mean-line should satisfy the following twoproperties:

-   -   Property 1: As the pixels immediately above the baseline become        lighter, the value of the fitness function will increase (and        vice versa).    -   Property 2: As the pixels overlapping with the baseline become        darker, the value of the fitness function will increase (and        vice versa).

Once the mean-line is determined, the x-height can then be extracted bysimply subtracting the corresponding mean-line and base-linecoordinates. However, this process introduces an additionalcomputational load, effectively doubling the entire feature extractionexecution time.

In practice, non-linear deformations of the type discussed herein haveno influence on individual letter dimensions. In other words, thex-height does not change across the “wavy” text-line, provided that theline contains letters of the same font style and size. This conclusionfacilitates the process of computing the x-height since it directlyimplies that the curves for the mean-line and the base-line will haveexactly the same shape. Accordingly, the mean-line can be computed inthe following way: the curve for the base-line is shifted pixel by pixeltowards the text-line's top, and the inverted fitness function iscomputed each time the curve is shifted upward. The shifted curve thatresults when the fitness function reaches its maximum value will be themean-line. The number of pixels by which the base-line curve is shiftedupward to obtain the mean-line is equal to the x-height.

Sometimes an input bitmap of an individual text-line cannot be used asthe only source of information to obtain a single value for x-heightover an entire image. For example, some text lines may be short linescomposed of numbers only. Another example is a caption that is in allcapital letters. Because of such cases, the x-height computation maysometimes be performed in a somewhat more sophisticated manner.

In this implementation, before computing the x-height, it is determinedwhether the text-lines in the images should be divided into differentgroups that are each likely to contain text-lines with differentx-heights. Such text-line groups may be determined in a variety ofdifferent ways. For instance, text-lines may be grouped according totheir dominant letter stroke width. This approach essentially assumesthat different x-heights arise from the use of different fonts and fontsizes and that each such font and font size is characterized by adifferent dominant stroke width. Thus, groups of text lines with acommon dominant stroke width likely have a common x-height.

The dominant stroke width may be determined at this stage of the OCRprocess or it may have been determined in an earlier stage of processingwhich precedes the text line analysis described herein. One example of amethod for determining stroke width is shown in U.S. patent applicationSer. No. ______ [Docket No. 328299.01], which is hereby incorporated byreference in its entirety.

In one alternative, instead of grouping text lines by their dominantstroke width, individual words can be divided into their own groups.

To determine the x-height of a particular group, begin by defining amean-line candidate [j] as a base-line's curve shifted by j pixels up.Next, for each group, a common buffer is established. For each text-linein a group, an inverted fitness function of the mean-line candidate [j]is added to the buffer. At the end of the process, the buffer's elementj will contain a sum of inverted fitness functions for all the mean-linecandidates [j] within the particular group. The most-probable x-heightvalue for the particular group corresponds to the value of j for whichthe buffer has its maximum value.

A flowchart illustrating the process of determining the x-height fordifferent groups of text lines is shown in FIG. 4. The process begins instep 105 when the text-lines in an image are divided into groups by anyappropriate criterion such as font size, dominant stroke width, or thelike. For each group, the process continues from step 110 to step 115 inwhich an accumulation buffer is established and initialized to a valueof zero. Next, for each text line within the group the process proceedsfrom step 120 to step 125 in which j is initialized to zero and themean-line candidate is initialized to the base-line. The value of j isincremented by 1 in step 135, which corresponds to shifting thebase-line curve upwards by one pixel. The fitness function for thismean-line (which corresponds to the inverse fitness function of thebase-line) is calculated in step 140. Also in step 140, the accumulationbuffer is defined as the sum of its previous value and the value of thefitness function that has just been calculated. Decision step 145 thendetermines if the maximum value of the accumulation buffer has beenreached. If so, then in step 150 the current value of j corresponding tothis maximum value is determined to be the x-height for this group.Alternatively, if the maximum value of the accumulation buffer has notbeen reached, the process proceeds from decision step 145 back to step130 in which the current mean-line is shifted upward by 1 pixel. Thisprocess continues until the maximum value of the accumulation buffer hasbeen reached. Once the x-height value has been determined for this groupthe process returns to step 120 and repeats for any remaining groups oftext lines, finally ending at step 155.

FIG. 5 shows one example of an image processing apparatus 300 that mayperform the process of extracting information concerning the text-linesin a textual image. The apparatus, which may be incorporated in an OCRengine, can be used by the OCR engine to determine the base-line,mean-line and x-height of the text lines in the image. The apparatusincludes an input component 302 for receiving an input image and aparameterizing engine 310 for finding parametric curves that correspondto the base-line and the mean-line of the text lines in the image. Theparameterizing engine 310 includes a base-line determination component322, a mean-line determination component 324 and an x-heightdetermination component 324. The apparatus 300 also includes an outputcomponent 330 that generates the information concerning the text-linesin a form that allows it to be employed by subsequent components of theOCR engine.

As used in this application, the terms “component,” “module,” “engine,”“system,” “apparatus,” “interface,” or the like are generally intendedto refer to a computer-related entity, either hardware, a combination ofhardware and software, software, or software in execution. For example,a component may be, but is not limited to being, a process running on aprocessor, a processor, an object, an executable, a thread of execution,a program, and/or a computer. By way of illustration, both anapplication running on a controller and the controller can be acomponent. One or more components may reside within a process and/orthread of execution and a component may be localized on one computerand/or distributed between two or more computers.

Furthermore, the claimed subject matter may be implemented as a method,apparatus, or article of manufacture using standard programming and/orengineering techniques to produce software, firmware, hardware, or anycombination thereof to control a computer to implement the disclosedsubject matter. The term “article of manufacture” as used herein isintended to encompass a computer program accessible from anycomputer-readable device, carrier, or media. For example, computerreadable media can include but are not limited to magnetic storagedevices (e.g., hard disk, floppy disk, magnetic strips . . . ), opticaldisks (e.g., compact disk (CD), digital versatile disk (DVD) . . . ),smart cards, and flash memory devices (e.g., card, stick, key drive . .. ). Of course, those skilled in the art will recognize manymodifications may be made to this configuration without departing fromthe scope or spirit of the claimed subject matter.

Although the subject matter has been described in language specific tostructural features and/or methodological acts, it is to be understoodthat the subject matter defined in the appended claims is notnecessarily limited to the specific features or acts described above.Rather, the specific features and acts described above are disclosed asexample forms of implementing the claims.

1. A system that extracts information which characterized text lines inan image, comprising: an input component for receiving a bitmap of aninput image that includes text lines; and a parameterizing engine thatdetermines a base-line for at least one text line in the image byfinding a parametric curve that maximizes a fitness function thatdepends on values of pixels through which the parametric curve passesand pixels below the parametric curve, wherein the base-line correspondsto the parametric curve for which the fitness function is maximized. 2.The system of claim 1 wherein the fitness function is defined as fitness(baseline) and is equal to${{fitness}({baseline})} = {{\sum\limits_{x = 0}^{{width} - 1}\; {{img}\left\lbrack {{{{baseline}\lbrack x\rbrack} + 1},x} \right\rbrack}} - {\sum\limits_{x = 0}^{{width} - 1}{{img}\left\lbrack {{{baseline}\lbrack x\rbrack},x} \right\rbrack}}}$where: x and y are horizontal and vertical pixel coordinatesrespectively; img[y, x] is a pixel value of the bitmap at location (y,x); width is a width of bitmap input image; and baseline [x] is ay-coordinate of the base-line at position x.
 3. The system of claim 1wherein at least one control parameter constrains at least one featureof the parametric curve.
 4. The system of claim 3 wherein the feature ofthe parametric curve that is determined by the control parameter is amaximum rate of change of the parametric curve along the text line. 5.The system of claim 1 wherein the parametric curve includes a pluralityof control points connected by straight lines, wherein the controlpoints are constrained to move only in a vertical direction.
 6. Thesystem of claim 1 wherein the parametric curve is defined as a B-splinehaving a shape that is determined by its spline coefficients.
 7. Thesystem of claim 1 further comprising maximizing the fitness functionusing an optimization technique is selected from the group consisting ofa genetic search and dynamic programming.
 8. The system of claim 1wherein the parameterizing engine further comprises a mean-linedetermination component that determines a mean-line for the at least oneline of text.
 9. The system of claim 8 wherein the mean-linedetermination component determines the mean-line by maximizing a secondfitness function for a second parametric curve, wherein the secondfitness function increases with increasing lightless of pixelsimmediately above the second parametric curve and also increases withdecreasing lightness of pixels through which the second parametric curvepasses.
 10. The system of 9 wherein the mean-line determinationcomponent determines the mean-line by: incrementally shifting thebase-line upward by predetermined amounts until a second fitnessfunction for the shifted base-line is maximized, wherein the secondfitness function increases with increasing lightless of pixelsimmediately above the shifted base-line and also increases withdecreasing lightness of pixels through which the shifted base-linepasses.
 11. The system of claim 10 further comprising an x-heightdetermination component that determines an x-height for the at least onetext line, wherein the x-height is equal to a predetermined amount bywhich the base-line is shifted upward in order to maximize the secondfitness function.
 12. The system of claim 1 wherein the parameterizingengine determines a different x-height for different groups oftext-lines in the input image.
 13. The system of claim 1 wherein theparameterizing engine determines a most probable x-height for differentgroups of text-lines in the input image based on an inverted fitnessfunction of a mean-line candidate for each text-line in the group. 14.The system of claim 12 wherein the parameterizing engine divides thetext-lines in the input image into groups based on their dominantstroke-width.
 15. A method for extracting information whichcharacterized text lines in an image, comprising: receiving a bitmap ofan input image that includes text lines; and determining a base-line forat least one text line in the image by finding a parametric curve thatmaximizes a fitness function that depends on values of pixels throughwhich the parametric curve passes and pixels below the parametric curve,wherein the base-line corresponds to the parametric curve for which thefitness function is maximized.
 16. The method of claim 15 wherein thefitness function increases with increasing lightless of pixelsimmediately below the parametric curve and also increases withdecreasing lightness of pixels through which the parametric curvepasses.
 17. The method of claim 16 wherein the fitness function isdefined as fitness (baseline) and is equal to${{fitness}({baseline})} = {{\sum\limits_{x = 0}^{{width} - 1}\; {{img}\left\lbrack {{{{baseline}\lbrack x\rbrack} + 1},x} \right\rbrack}} - {\sum\limits_{x = 0}^{{width} - 1}{{img}\left\lbrack {{{baseline}\lbrack x\rbrack},x} \right\rbrack}}}$where: x and y are horizontal and vertical pixel coordinatesrespectively; img[y, x] is a pixel value of the bitmap at location (y,x); width is a width of bitmap input image; and baseline [x] is ay-coordinate of the base-line at position x.
 18. The method of claim 17further comprising maximizing the fitness function using an optimizationtechnique selected from the group consisting of a genetic search anddynamic programming.
 19. The method of claim 15 further comprisingdetermining a mean-line for the at least one line of text byincrementally shifting the base-line upward by predetermined amountsuntil a second fitness function for the shifted base-line is maximized,wherein the second fitness function increases with increasing lightlessof pixels immediately above the shifted base-line and also increaseswith decreasing lightness of pixels through which the shifted base-linepasses.
 20. The method of claim 19 further comprising determining anx-height for the at least one text line, wherein the x-height is equalto a sum of the predetermined amounts by which the base-lines is shiftedupward in order to maximize the second fitness function.